require(tidyterra)
require(dismo)
require(tidyverse)
require(terra)
require(predicts)
require(ggnewscale)
require(mgcv)
require(randomForest)
require(maxnet)
require(enmSdmX)
require(gbm)
Today we’re going to be using the same data from the previous lab. If you recall, sampling occurred at 100 m radius sites along transects in Montana and Idaho. The transects were randomly distributed within USFWS lands, but the sampling sites on the transects were placed regularly every 300 m with approximately 10 sites on every transect. At each site, observers conducted 10-minute point counts where they recorded all birds seen or heard. Again, we’re going to be focusing on detections of Varied Thrushes (Ixoreus naevius) and using a handful of tools to build species distribution models based on those detections.
We’re going to start by importing data collected in 2004. We will use these data to build our SDMs. Technically, what we have here is presence-absence data. However, we’re going to subset out the sites where Varied Thrush were detected and treat this as a presence-only dataset.
vathData = read.csv('https://raw.githubusercontent.com/ValenteJJ/SpatialEcology/main/Week8/vath_2004.csv')
vathPres = vathData %>% filter(VATH==1)
vathAbs = vathData %>% filter(VATH==0)
vathPresXy = as.matrix(vathPres %>% select(EASTING, NORTHING))
vathAbsXy = as.matrix(vathAbs %>% select(EASTING, NORTHING))
A subset of these points were revisited in 2007-2008, and we’re going to use results from that sampling effort to validate our models.
vathVal = read.csv('https://raw.githubusercontent.com/ValenteJJ/SpatialEcology/main/Week8/vath_VALIDATION.csv')
vathValPres = vathVal %>% filter(VATH==1)
vathValAbs = vathVal %>% filter(VATH==0)
vathValXy = as.matrix(vathVal %>% select(EASTING, NORTHING))
vathValPresXy = as.matrix(vathValPres %>% select(EASTING, NORTHING))
vathValAbsXy = as.matrix(vathValAbs %>% select(EASTING, NORTHING))
To build an SDM, we need to import information on a handful of covariates that we think will be useful for describing the distribution of Varied Thrush. Thus, we’re going to pull in datasets representing elevation, canopy cover, distribution of mesic forest, and mean annual precipitation.
elev = rast('https://github.com/ValenteJJ/SpatialEcology/raw/main/Week8/elevation.tif')
canopy = rast('https://github.com/ValenteJJ/SpatialEcology/raw/main/Week8/canopy.tif')
mesic = rast('https://github.com/ValenteJJ/SpatialEcology/raw/main/Week8/mesic.tif')
precip = rast('https://github.com/ValenteJJ/SpatialEcology/raw/main/Week8/precip.tif')
crs(elev) = crs(mesic)
crs(canopy) = crs(mesic)
One challenge you’ll often run into with building SDMs is that you need to use spatial data that not only have the same projection, but also have the same extent and resolution. By using the compareGeom() function, we can evaluate whether these 4 rasters are compatibile.
compareGeom(elev, canopy, stopOnError=F)
## [1] TRUE
compareGeom(elev, precip, stopOnError=F)
## [1] FALSE
compareGeom(elev, mesic, stopOnError=F)
## [1] FALSE
It turns out the answer is no. Elevation and canopy have the same properties, but precipitation and mesic are a bit different. Let’s look at elevation and mesic for an example.
elev
## class : SpatRaster
## dimensions : 2083, 1643, 1 (nrow, ncol, nlyr)
## resolution : 200, 200 (x, y)
## extent : 19165, 347765, 164300, 580900 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=aea +lat_0=44 +lon_0=-109.5 +lat_1=46 +lat_2=48 +x_0=600000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## source : elevation.tif
## name : elev_km
## min value : 0.000
## max value : 3.079
mesic
## class : SpatRaster
## dimensions : 2050, 1586, 1 (nrow, ncol, nlyr)
## resolution : 210, 210 (x, y)
## extent : 16965, 350025, 153735, 584235 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=aea +lat_0=44 +lon_0=-109.5 +lat_1=46 +lat_2=48 +x_0=600000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## source : mesic.tif
## name : a_pmesic
## min value : 0
## max value : 1
ggplot()+
geom_raster(data=elev, aes(x=x, y=y, fill=elev_km))+
scale_fill_gradientn(colors=c('blue', 'white', 'red'))+
new_scale_fill()+
geom_raster(data=mesic, aes(x=x, y=y, fill=a_pmesic), alpha=0.2)+
scale_fill_gradientn(colors=c('yellow', 'black', 'green'))+
coord_fixed()+
theme_bw()+
theme(panel.grid=element_blank())+
xlim(21000, 23000)+
ylim(302000, 303000)
## Warning: Removed 3422333 rows containing missing values (`geom_raster()`).
## Warning: Removed 3251276 rows containing missing values (`geom_raster()`).
I’ve zoomed in on a section of this map so you can see the problem. Because these two rasters don’t have the same origins, extents, and resolutions, the cells don’t match up. This creates a problem when we want to make predictions to cells using explanatory variables from both the mesic and elevation rasters. To solve this problem, we need to resample the mesic and precipitation rasters to have the same properties as the elevation and canopy rasters. We’ll do that with the resample() function. Recall that when we do any kind of aggregation or resampling with a raster, we’re going to have to make a decision about how to assign the values in the new raster. For the mesic raster, we’re going to use the “nearest neighbor” method, and for the precipitation raster we’ll use bilinear interpolation. We’ll also mask out the values in the new rasters for which we have no information in the elevation raster (i.e., those cells that are NAs).
mesic = resample(x = mesic, y = elev, 'near')
precip = resample(x = precip, y = elev, 'bilinear')
mesic = mask(mesic, elev)
precip = mask(precip, elev)
Just verifying that we’ve solved our problems.
compareGeom(elev, precip, canopy, mesic)
## [1] TRUE
Now, rather than using mesic as a 0/1 variable, I want to measure the amount of mesic forest within 1 km of my sampling locations. So I’m going to use our focal() tools to create a new raster from which I can extract these values.
probMatrix = focalMat(mesic, 1000, type='circle', fillNA=FALSE)
mesic1km = focal(mesic, probMatrix, fun='sum')
Next, I can combine all 5 of my explanatory variable rasters into one stack and visualize them.
layers = c(canopy, elev, mesic, mesic1km, precip)
names(layers) = c('canopy', 'elev', 'mesic', 'mesic1km', 'precip')
plot(layers)
Each of these elements may play a role in affecting the distribution of Varied Thrush, and our goal is to use the data present in all of these layers to our advantage. As a last step, we want to evaluate correlation between these covariates.
pairs(layers, maxpixels=1000)
Note the strong correlation in mesic and mesic1km. We probably shouldn’t put those variables in any models at the same time.
layers = c(canopy, elev, mesic1km, precip)
names(layers) = c('canopy', 'elev', 'mesic1km', 'precip')
For many SDM applications, we will only have presence information and will be lacking known absence data. Again, we’re trying to replicate this situation here, so we need to create our own “background” points that represent the distribution of habitat elements available on the landscape. The backgroundSample() function is useful for us here. Not only will it randomly distribute sampling locations in our space of interest, but we can input coordinates for our known presence locations so that random points do not fall in the same places.
set.seed(23)
backXy = data.frame(backgroundSample(layers, n=2000, p=vathPresXy))
ggplot()+
geom_raster(data=elev, aes(x=x, y=y, fill=elev_km))+
geom_point(data=backXy, aes(x=x, y=y))+
geom_point(data=vathPres, aes(x=EASTING, y=NORTHING), color='red', alpha=0.3)+
coord_fixed()
So, now we have our presence points, our background points, and our validation points. Let’s pull our 4 explanatory variables out of our rasters for each of those data sets, and then convert them into dataframes containing all of the information we have on each point thusfar.
presCovs = extract(layers, vathPresXy)
backCovs = extract(layers, backXy)
valCovs = extract(layers, vathValXy)
presCovs = data.frame(vathPresXy, presCovs, pres=1)
backCovs = data.frame(backXy, backCovs, pres=0)
valCovs = data.frame(vathValXy, valCovs)
Finally, I’m going to quickly remove any sites that may have inadvertently fallen into a cell where we don’t have information on the explanatory variables, and then combine our presence points from 2004 with our background points.
presCovs = presCovs[complete.cases(presCovs),]
backCovs = backCovs[complete.cases(backCovs),]
valCovs = valCovs[complete.cases(valCovs),]
backCovs = backCovs %>% select(-ID)
colnames(presCovs)[1:2] = c('x', 'y')
presBackCovs = rbind(presCovs, backCovs)
Recall that envelope models only use presence data. The bioclim() function calculates the percentiles of the observed environmental covariates at each location where the species was detected. Values for each point are then compared to these distributions. The closer values are to the median of all of the covariates, the more suitable that location is deemed. if there is no covariate effect, then it will still build an envelope and treat it as important
tmp = presCovs %>% select(elev, precip, mesic1km, canopy) %>%
as.matrix()
bioclim = envelope(tmp)
plot(bioclim, a=1, b=2, p=0.95)
plot(bioclim, a=1, b=3, p=0.95)
plot(bioclim, a=3, b=4, p=0.95)
bioclimMap = predict(layers, bioclim)
plot(bioclimMap)
It’s a rather simple tool, but it makes a lot of intuitive sense. One of the downsides is that it gives equal weight to all covariates you input, so there is no evaluation of whether a particular covariate actually influences presence/absence.
Let’s move onto something a bit more mathy. Here we’re going to fit a generalized linear model (logistic regression) in which we model the presence and background points as a function of our 4 covariates.
glmModel = glm(pres ~ canopy + elev + I(elev^2) + mesic1km + precip, family='binomial', data=presBackCovs)
summary(glmModel)
##
## Call:
## glm(formula = pres ~ canopy + elev + I(elev^2) + mesic1km + precip,
## family = "binomial", data = presBackCovs)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -12.566904 2.068022 -6.077 1.23e-09 ***
## canopy 0.736355 0.284090 2.592 0.009543 **
## elev 13.831002 3.376765 4.096 4.20e-05 ***
## I(elev^2) -5.886916 1.348976 -4.364 1.28e-05 ***
## mesic1km 0.788289 0.378511 2.083 0.037287 *
## precip 0.015800 0.004609 3.428 0.000608 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 773.38 on 2094 degrees of freedom
## Residual deviance: 663.56 on 2089 degrees of freedom
## AIC: 675.56
##
## Number of Fisher Scoring iterations: 8
glmMap = predict(layers, glmModel, type='response')
plot(glmMap)
One limitation of this approach is that we are limited to assuming linear and/or polynomial relationships between presence and the covariates of interest. We also have to be very explicit about any interactions that we think may exist, and again, we are limited in our ability to specify how complicated these relationships are. Nonetheless, this approach is very intuitive as well. We build a model that accurately reflects differences between presence and background sites, then use it to predict the likely probability of presence at other points.
GAMs do a much better job of capturing non-linear relationships between explanatory variables and presence because they use splines to build the links. Below I am specifying 6 knots for each of the covariates which allows for a lot of flexibility in molding the relationships.
gamModel = gam(pres ~ s(canopy, k=6) + s(elev, k=6) + s(mesic1km, k=6) + s(precip, k=6), family='binomial', data=presBackCovs, method='ML')
summary(gamModel)
##
## Family: binomial
## Link function: logit
##
## Formula:
## pres ~ s(canopy, k = 6) + s(elev, k = 6) + s(mesic1km, k = 6) +
## s(precip, k = 6)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.1358 0.2628 -15.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(canopy) 1.000 1.000 5.469 0.01936 *
## s(elev) 2.883 3.478 29.631 1.16e-05 ***
## s(mesic1km) 1.000 1.000 0.194 0.65999
## s(precip) 3.587 3.936 23.672 0.00013 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0799 Deviance explained = 17.7%
## -ML = 333.19 Scale est. = 1 n = 2095
gamMap = predict(layers, gamModel, type='response')
plot(gamMap)
Despite this ability to add complexity, GAMs are still unable to capture quite as much of the non-linearity as other approaches described below.
boostModel = gbm(pres ~ elev + canopy + mesic1km + precip, distribution='bernoulli', n.trees=100, interaction.depth=2, shrinkage=0.1, bag.fraction=0.5, data=presBackCovs)
boostMap = predict(layers, boostModel, type='response')
## Using 100 trees...
##
## Using 100 trees...
##
## Using 100 trees...
boostMap = mask(boostMap, layers$canopy)
plot(boostMap)
Random forest models are going to create a bunch of classification trees by bootstrapping random samples from the data we feed into it. There are two parameters we need to choose when fitting our model. The first is mtry which is the number of explanatory variables sampled for each tree. The second is ntree which is the number of trees in the forest. We can start by using the tuneRF function to help us identify the best value of mtry.
tuneRF(y = as.factor(presBackCovs$pres), x=presBackCovs[,3:6], stepFactor = 2, ntreeTry = 500)
## mtry = 2 OOB error = 4.15%
## Searching left ...
## mtry = 1 OOB error = 4.3%
## -0.03448276 0.05
## Searching right ...
## mtry = 4 OOB error = 4.11%
## 0.01149425 0.05
## mtry OOBError
## 1.OOB 1 0.04295943
## 2.OOB 2 0.04152745
## 4.OOB 4 0.04105012
Here it looks like our error OOB error is minimized when we use a mtry value of 2, so that’s what we’ll use to fit our models. We are going to leave the value of ntree at 500 which is the default.
rfModel = randomForest(as.factor(pres) ~ canopy + elev + mesic1km + precip, data=presBackCovs, mtry=2, ntree=500, na.action = na.omit)
rfMap = predict(layers, rfModel, type='prob', index=2)
plot(rfMap)
Once again, we can predict the values from our fitted random forest model to our landscape.
Finally, we’re going to try a maxent model. Recall that maxent should ONLY be used with presence-only data. There are a few parameters here that we can tweak as well. The first is regmult which is a regularization constant. As the value of regmult increases, a greater penalty is imposed on coefficients that do not explain much variability in presence locations. The second is the types of basis functions that can be considered. We can specify any combination of “lqpht” (linear, quadratic, product, hinge, or threshold).
pbVect = presBackCovs$pres
covs = presBackCovs %>% select(canopy:precip)
maxentModel = maxnet(p = pbVect,
data= covs,
regmult = 1,
classes='lqpht')
plot(maxentModel, type='logistic')
maxentMap = predictMaxNet(maxentModel, layers, type='logistic')
par(mfrow=c(1,1))
plot(maxentMap)
As a final step here, let’s look at the partial plots to examine the relationship between predicted occurrence and the environmental variables.
tmp = expand.grid(elev = seq(min(backCovs$elev), max(backCovs$elev), length=1000),
canopy = mean(backCovs$canopy),
precip = mean(backCovs$precip),
mesic1km = mean(backCovs$mesic1km))
elevData = data.frame(bioclim = predict(bioclim, tmp),
glm = predict(glmModel, tmp, type='response'),
gam = predict(gamModel, tmp, type='response'),
boost = predict(boostModel, tmp, type='response'),
rf = predict(rfModel, tmp, type='prob')[,2],
maxent = predict(maxentModel, tmp, type='logistic')[,1]) %>%
cbind(tmp) %>%
select(bioclim:elev) %>%
pivot_longer(bioclim:maxent) %>%
mutate(variable = 'elevation')
## Using 100 trees...
tmp = expand.grid(elev = mean(backCovs$elev),
canopy = seq(min(backCovs$canopy), max(backCovs$elev), length=1000),
precip = mean(backCovs$precip),
mesic1km = mean(backCovs$mesic1km))
canopyData = data.frame(bioclim = predict(bioclim, tmp),
glm = predict(glmModel, tmp, type='response'),
gam = predict(gamModel, tmp, type='response'),
boost = predict(boostModel, tmp, type='response'),
rf = predict(rfModel, tmp, type='prob')[,2],
maxent = predict(maxentModel, tmp, type='logistic')[,1]) %>%
cbind(tmp) %>%
select(bioclim:maxent, canopy) %>%
pivot_longer(bioclim:maxent) %>%
mutate(variable = 'canopy')
## Using 100 trees...
tmp = expand.grid(elev = mean(backCovs$elev),
canopy = mean(backCovs$canopy),
precip = seq(min(backCovs$precip), max(backCovs$precip), length=1000),
mesic1km = mean(backCovs$mesic1km))
precipData = data.frame(bioclim = predict(bioclim, tmp),
glm = predict(glmModel, tmp, type='response'),
gam = predict(gamModel, tmp, type='response'),
boost = predict(boostModel, tmp, type='response'),
rf = predict(rfModel, tmp, type='prob')[,2],
maxent = predict(maxentModel, tmp, type='logistic')[,1]) %>%
cbind(tmp) %>%
select(bioclim:maxent, precip) %>%
pivot_longer(bioclim:maxent) %>%
mutate(variable = 'precipitation')
## Using 100 trees...
tmp = expand.grid(elev = mean(backCovs$elev),
canopy = mean(backCovs$canopy),
precip = mean(backCovs$precip),
mesic1km = seq(min(backCovs$mesic1km), max(backCovs$mesic1km), length=1000))
mesicData = data.frame(bioclim = predict(bioclim, tmp),
glm = predict(glmModel, tmp, type='response'),
gam = predict(gamModel, tmp, type='response'),
boost = predict(boostModel, tmp, type='response'),
rf = predict(rfModel, tmp, type='prob')[,2],
maxent = predict(maxentModel, tmp, type='logistic')[,1]) %>%
cbind(tmp) %>%
select(bioclim:maxent, mesic1km) %>%
pivot_longer(bioclim:maxent) %>%
mutate(variable = 'mesic1km')
## Using 100 trees...
colnames(elevData)[1] = colnames(canopyData)[1] = colnames(precipData)[1] = colnames(mesicData)[1] = 'xValue'
tmp = rbind(elevData, canopyData, precipData, mesicData)
ggplot(tmp, aes(x=xValue, y=value, color=name))+
facet_wrap(~variable, scales='free_x')+
geom_line()+
theme_bw()+
theme(panel.grid=element_blank())
There’s a LOT to unpack here. One of the first things you’ll notice is the general difference in the magnitude of the predictecd values. Bioclim is modeling similarity while Maxent predictions are based on logistic predictions that make the average prediction for presence locations 0.5. GLM, GAM, BRT, and RF are all discriminating presence from background points such that the predicted value is the probability that a point will be occupied. Thus, the probabilities are going to fall as you increase the number of background points.
You’ll also note that the relationships between the covariates and occurrence tend to be complex for RF, bioclim, and BRT models and much more smooth for the others. After spring break we will be discussing how to interpret these differences and identify the best SDM for your purposes.
Varied Thrush are considered forest-dependent, and thus one might characterize mesic forests as “habitat” for the species. Calculate the total amount of mesic forest in the study area, and the mean size of the mesic forest patches.
Using the SDM built from the random forest model, convert the landscape into “habitat” and “non-habitat.” To do this, choose a threshold value in your SDM and convert all cells with predicted outcomes greater than this threshold to 1 and all cells with predicted values below your threshold to 0. Justify your choice of your threshold value. Now calculate the total amount of habitat and mean size of habitat patches based on this new raster (i.e., create patches of “habitat” based on aggregations of cells you deemed 1). How do the habitat amount and patch size values compare between the mesic forest approach and the SDM-based approach? In what situations might you rely on one map over the other?
# Plot Mesic Forest
mesic_binary <- ifel(mesic == 1, 1, 0)
plot(mesic_binary, main="Binary Mesic Forest Raster", col=c("white", "green"), legend=TRUE, axes=TRUE)
# Set up Mesic Forest Patches
mesic_patches <- patches(mesic_binary, directions=8, zeroAsNA=T)
plot(mesic_patches, main="Mesic Patches", legend=TRUE)
# Sum mesic forest values
mesic_forest_cells <- sum(values(mesic) == 1, na.rm = TRUE)
non_mesic_forest_cells <- sum(values(mesic) != 1, na.rm = TRUE)
mesic_percent_rounded <- round((mesic_forest_cells/(mesic_forest_cells+non_mesic_forest_cells)*100),2)
# Calculate the area of a single cell
cell_area_square_meters <- res(mesic)[1] * res(mesic)[2] # Cell resolution (width * height) = 200 m x 200 m
cell_area_sqkm <- cell_area_square_meters/1000000 # Convert to square kilometers
# Calculate total mesic forest area
total_mesic_area <- mesic_forest_cells * cell_area_sqkm
# Calculate mean patch size
mesic_patch_areas <- table(values(mesic_patches))
mesic_mean_patch_size_sqkm <- mean(mesic_patch_areas * cell_area_sqkm)
mesic_mean_patch_size_roundedsqkm <- round(mesic_mean_patch_size_sqkm, 2)
print(paste("In our study area, mesic forest covers", total_mesic_area, "square kilometers, which accounts for", mesic_percent_rounded, "% of the study area."))
## [1] "In our study area, mesic forest covers 40217 square kilometers, which accounts for 36.82 % of the study area."
# Output results
print(paste("Furthermore, the mean patch size of mesic forest is", mesic_mean_patch_size_roundedsqkm, "square kilometers."))
## [1] "Furthermore, the mean patch size of mesic forest is 7.49 square kilometers."
# Plot Habitat Threshold above 0.5 & apply threshold to convert probabilities to binary habitat classification
rf_binary <- ifel(rfMap > 0.5, 1, 0)
plot(rf_binary, main="Binary rf Habitat Raster", col=c("white", "green"), legend=TRUE, axes=TRUE)
# Step 2: Identify habitat patches
habitat_patches <- patches(rf_binary, directions=8, zeroAsNA=T)
plot(habitat_patches, main="Habitat Patches", legend=TRUE)
# Calculate total habitat area
cell_area <- res(rf_binary)[1] * res(rf_binary)[2]
total_habitat_area_sqkm <- (sum(values(rf_binary) == 1, na.rm = TRUE) * cell_area_sqkm)
# Calculate mean patch size
patch_areas <- table(values(habitat_patches))
mean_patch_size_sqkm <- (mean(patch_areas * cell_area_sqkm))
mean_patch_size_roundedsqkm <- round(mean_patch_size_sqkm, 2)
# Output results
print(paste("According to the random forest model (with a >0.5 threshold for habitat), the total habitat area is", total_habitat_area_sqkm, "square kilometers, while the mean patch size is", mean_patch_size_roundedsqkm, "square kilometers."))
## [1] "According to the random forest model (with a >0.5 threshold for habitat), the total habitat area is 288.88 square kilometers, while the mean patch size is 0.11 square kilometers."
ANSWER: First, deciding on a threshold value is obviously critically important if converting an SDM map to a binary habitat and non-habitat map – and should be selected carefully. In this case, I decided to go with a threshold value of 0.5, given that the maximum value generated by the Random Forest model was ~1. If selecting a threshold somewhat arbitrarily, I think the best starting place would be the center of the range of occupancy values if pseudo-absences were used in the model. If true-absences were used, this would not be the case because mean occupancy values could be very high or very low, and for good reason.
As shown in my outputs above, mesic forest covers 40217 square kilometers and has a mean patch size of 7.49 square kilometers. In contrast, the random forest model (with a >0.5 threshold for habitat) shows that the total habitat area is limited to 298.36 square kilometers, with a mean patch size of 0.12 square kilometers. This discrepancy of what could constitute VATH habitat is massive. Of course, using mesic forest as the only habitat metric is likely leading to a vast overestimate. In contrast, the random forest model, filtered by threshold values of >0.5, may be far too selective and an under-representation of habitat that is used.
If creating a coarse species extent map or deciding on a general region to study or manage for VATH, relying on the mesic map may be more appropriate at a local/regional scale. If trying to pinpoint VATH for capture or nest monitoring, using the highly selective RF model may be more appropriate to rely on during initial targeting of efforts (again on a local or regional scale, as it would be inappropriate to use either map for other areas).
When we fit the Maxent model in the lab, we used a regularization constant of 1. Fit the model two more times, using regularization (regmult) constants of 0.5 and 3. Construct figures showing the relationship between the 4 explanatory variables and the predicted outcome from these 3 fitted Maxent models. What is the regularization constant doing? Hint: you may need to Google it.
# Set Up
pbVect = presBackCovs$pres
covs = presBackCovs %>% select(canopy:precip)
# Models
maxentModel.5 = maxnet(p = pbVect,
data= covs,
regmult = 0.5,
classes='lqpht')
maxentModel = maxnet(p = pbVect,
data= covs,
regmult = 1,
classes='lqpht')
maxentModel3 = maxnet(p = pbVect,
data= covs,
regmult = 3,
classes='lqpht')
# Set up Data Frames
tmp = expand.grid(elev = seq(min(backCovs$elev), max(backCovs$elev), length=1000),
canopy = mean(backCovs$canopy),
precip = mean(backCovs$precip),
mesic1km = mean(backCovs$mesic1km))
elevData = data.frame(elev = tmp$elev,
maxent.5 = predict(maxentModel.5, tmp, type='logistic')[,1],
maxent = predict(maxentModel, tmp, type='logistic')[,1],
maxent3 = predict(maxentModel3, tmp, type='logistic')[,1]) %>%
pivot_longer(-elev, names_to = "model", values_to = "prediction") %>%
mutate(variable = 'elevation')
tmp = expand.grid(elev = mean(backCovs$elev),
canopy = seq(min(backCovs$canopy), max(backCovs$canopy), length=1000),
precip = mean(backCovs$precip),
mesic1km = mean(backCovs$mesic1km))
canopyData = data.frame(canopy = tmp$canopy,
maxent.5 = predict(maxentModel.5, tmp, type='logistic')[,1],
maxent = predict(maxentModel, tmp, type='logistic')[,1],
maxent3 = predict(maxentModel3, tmp, type='logistic')[,1]) %>%
pivot_longer(-canopy, names_to = "model", values_to = "prediction") %>%
mutate(variable = 'canopy')
tmp = expand.grid(elev = mean(backCovs$elev),
canopy = mean(backCovs$canopy),
precip = seq(min(backCovs$precip), max(backCovs$precip), length=1000),
mesic1km = mean(backCovs$mesic1km))
precipData = data.frame(precip = tmp$precip,
maxent.5 = predict(maxentModel.5, tmp, type='logistic')[,1],
maxent = predict(maxentModel, tmp, type='logistic')[,1],
maxent3 = predict(maxentModel3, tmp, type='logistic')[,1]) %>%
pivot_longer(-precip, names_to = "model", values_to = "prediction") %>%
mutate(variable = 'precipitation')
tmp = expand.grid(elev = mean(backCovs$elev),
canopy = mean(backCovs$canopy),
precip = mean(backCovs$precip),
mesic1km = seq(min(backCovs$mesic1km), max(backCovs$mesic1km), length=1000))
mesicData = data.frame(mesic1km = tmp$mesic1km,
maxent.5 = predict(maxentModel.5, tmp, type='logistic')[,1],
maxent = predict(maxentModel, tmp, type='logistic')[,1],
maxent3 = predict(maxentModel3, tmp, type='logistic')[,1]) %>%
pivot_longer(-mesic1km, names_to = "model", values_to = "prediction") %>%
mutate(variable = 'mesic1km')
# Combine and format for plotting
colnames(elevData)[1] = colnames(canopyData)[1] = colnames(precipData)[1] = colnames(mesicData)[1] = 'xValue'
combined_data = rbind(elevData, canopyData, precipData, mesicData)
# Plot
ggplot(combined_data, aes(x=xValue, y=prediction, color=model))+
facet_wrap(~variable, scales='free_x')+
geom_line()+
theme_bw()+
theme(panel.grid=element_blank())